Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

by ܴସܴଷܴଶܴଵܴଵ. The cleavage happened between ܴଵ and ܴଵ.

each peptide of length d was denoted by a vector ܠ∈ࣝ^ௗ, where

set of the amino acids. In most protease cleavage data analysis

all peptides had the same length. A set of peptides was denoted

Ω஺⋃Ω஻. Ω஺ was used to denote a set of the non-cleaved peptides

was used to denote a set of the cleaved peptides.

e starting the description of how to use GP to model factor Xa

cleavage data, the first thing was how to quantify the similarity

residues, which were the amino acids, from two peptides. The

io-basis function [Thomson and Yang, 2002; Thomson et al.,

s used to measure how a rule fitted a data point (an amino acid).

the m^th residue of a peptide x was denoted by ݔ௠ and the m^th

efined in a GP rule r was denoted by ݎ௠. The fitness was defined

, where ߨሺሻ was the mutation probability between two amino

ed on a mutation matrix,

ߨሺݔ௠, ݎ௠ሻ

(8.11)

ose a number of residues between a peptide and a rule were

in decision-making. The min-max function was developed for

of analysis [Yang, et al., 2003]. The min function was defined as

here x was a peptide and r was a GP rule,

߰^ାሺܠ, ܚሻൌmin

௠^ሼߨሺݔ^௠^{, ݎ}^௠^ሻሽ

(8.12)

press the min function, a RPN chromosome was expressed as

where ݎ௠ was the m^th residue of the GP rule r and ࣷ௠ was the

id used by ݎ௠,

߰^ାሺܠ, ܚሻൌቄෑሺݎ௠ࣷ௠ሻቅ൅

(8.13)

the RPN chromosome to represent a rule, the residue indexes in

were encoded using the letters, such as a, b, c, etc. For instance,

le of the min function was (aYdS)+. In this example, a and d were

nd the fourth residues of a peptide while Y and S were two amino

d in this rule for two residues, respectively. The fitness of this